Goto

Collaborating Authors

 formal model


Bridging LLM Planning Agents and Formal Methods: A Case Study in Plan Verification

Ramani, Keshav, Tawosi, Vali, Alamir, Salwa, Borrajo, Daniel

arXiv.org Artificial Intelligence

We introduce a novel framework for evaluating the alignment between natural language plans and their expected behavior by converting them into Kripke structures and Linear Temporal Logic (LTL) using Large Language Models (LLMs) and performing model checking. We systematically evaluate this framework on a simplified version of the PlanBench plan verification dataset and report on metrics like Accuracy, Precision, Recall and F1 scores. Our experiments demonstrate that GPT-5 achieves excellent classification performance (F1 score of 96.3%) while almost always producing syntactically perfect formal representations that can act as guarantees. However, the synthesis of semantically perfect formal models remains an area for future exploration.


Large Language Bayes

Domke, Justin

arXiv.org Artificial Intelligence

Many domain experts do not have the time or expertise to write formal Bayesian models. This paper takes an informal problem description as input, and combines a large language model and a probabilistic programming language to define a joint distribution over formal models, latent variables, and data. A posterior over latent variables follows by conditioning on observed data and integrating over formal models. This presents a challenging inference problem. We suggest an inference recipe that amounts to generating many formal models from the large language model, performing approximate inference on each, and then doing a weighted average. This is justified and analyzed as a combination of self-normalized importance sampling, MCMC, and importance-weighted variational inference. Experimentally, this produces sensible predictions from only data and an informal problem description, without the need to specify a formal model.


A formal implementation of Behavior Trees to act in robotics

Ingrand, Felix

arXiv.org Artificial Intelligence

Behavior Trees (BT) are becoming quite popular as an Acting component of autonomous robotic systems. We propose to define a formal semantics to BT by translating them to a formal language which enables us to perform verification of programs written with BT, as well as runtime verification while these BT execute. This allows us to formally verify BT correctness without requiring BT programmers to master formal language and without compromising BT most valuable features: modularity, flexibility and reusability. We present the formal framework we use: Fiacre, its langage and the produced TTS model; Tina, its model checking tools and Hippo, its runtime verification engine. We then show how the translation from BT to Fiacre is automatically done, the type of formal LTL and CTL properties we can check offline and how to execute the formal model online in place of a regular BT engine. We illustrate our approach on two robotics applications, and show how BT could benefit of other features available in the Fiacre formal framework (state variables, time, etc).


One Stack, Diverse Vehicles: Checking Safe Portability of Automated Driving Software

Nenchev, Vladislav

arXiv.org Artificial Intelligence

Integrating an automated driving software stack into vehicles with variable configuration is challenging, especially due to different hardware characteristics. Further, to provide software updates to a vehicle fleet in the field, the functional safety of every affected configuration has to be ensured. These additional demands for dependability and the increasing hardware diversity in automated driving make rigorous automatic analysis essential. This paper addresses this challenge by using formal portability checking of adaptive cruise controller code for different vehicle configurations. Given a formal specification of the safe behavior, models of target configurations are derived, which capture relevant effects of sensors, actuators and computing platforms. A corresponding safe set is obtained and used to check if the desired behavior is achievable on all targets. In a case study, portability checking of a traditional and a neural network controller are performed automatically within minutes for each vehicle hardware configuration. The check provides feedback for necessary adaptations of the controllers, thus, allowing rapid integration and testing of software or parameter changes.


An Algorithm-Centered Approach To Model Streaming Data

Hinder, Fabian, Vaquet, Valerie, Komnick, David, Hammer, Barbara

arXiv.org Artificial Intelligence

Besides the classical offline setup of machine learning, stream learning constitutes a well-established setup where data arrives over time in potentially non-stationary environments. Concept drift, the phenomenon that the underlying distribution changes over time poses a significant challenge. Yet, despite high practical relevance, there is little to no foundational theory for learning in the drifting setup comparable to classical statistical learning theory in the offline setting. This can be attributed to the lack of an underlying object comparable to a probability distribution as in the classical setup. While there exist approaches to transfer ideas to the streaming setup, these start from a data perspective rather than an algorithmic one. In this work, we suggest a new model of data over time that is aimed at the algorithm's perspective. Instead of defining the setup using time points, we utilize a window-based approach that resembles the inner workings of most stream learning algorithms. We compare our framework to others from the literature on a theoretical basis, showing that in many cases both model the same situation. Furthermore, we perform a numerical evaluation and showcase an application in the domain of critical infrastructure.


Using Formal Models, Safety Shields and Certified Control to Validate AI-Based Train Systems

Gruteser, Jan, Roßbach, Jan, Vu, Fabian, Leuschel, Michael

arXiv.org Artificial Intelligence

The certification of autonomous systems is an important concern in science and industry. The KI-LOK project explores new methods for certifying and safely integrating AI components into autonomous trains. We pursued a two-layered approach: (1) ensuring the safety of the steering system by formal analysis using the B method, and (2) improving the reliability of the perception system with a runtime certificate checker. This work links both strategies within a demonstrator that runs simulations on the formal model, controlled by the real AI output and the real certificate checker. The demonstrator is integrated into the validation tool ProB. This enables runtime monitoring, runtime verification, and statistical validation of formal safety properties using a formal B model. Consequently, one can detect and analyse potential vulnerabilities and weaknesses of the AI and the certificate checker. We apply these techniques to a signal detection case study and present our findings.


Navigating the sociotechnical labyrinth: Dynamic certification for responsible embodied AI

Bakirtzis, Georgios, Tubella, Andrea Aler, Theodorou, Andreas, Danks, David, Topcu, Ufuk

arXiv.org Artificial Intelligence

Sociotechnical requirements shape the governance of artificially intelligent (AI) systems. In an era where embodied AI technologies are rapidly reshaping various facets of contemporary society, their inherent dynamic adaptability presents a unique blend of opportunities and challenges. Traditional regulatory mechanisms, often designed for static -- or slower-paced -- technologies, find themselves at a crossroads when faced with the fluid and evolving nature of AI systems. Moreover, typical problems in AI, for example, the frequent opacity and unpredictability of the behaviour of the systems, add additional sociotechnical challenges. To address these interconnected issues, we introduce the concept of dynamic certification, an adaptive regulatory framework specifically crafted to keep pace with the continuous evolution of AI systems. The complexity of these challenges requires common progress in multiple domains: technical, socio-governmental, and regulatory. Our proposed transdisciplinary approach is designed to ensure the safe, ethical, and practical deployment of AI systems, aligning them bidirectionally with the real-world contexts in which they operate. By doing so, we aim to bridge the gap between rapid technological advancement and effective regulatory oversight, ensuring that AI systems not only achieve their intended goals but also adhere to ethical standards and societal values.


PROSKILL: A formal skill language for acting in robotics

Ingrand, Félix

arXiv.org Artificial Intelligence

Acting is an important decisional function to ensure proper deliberation on an autonomous system (Ingrand and Ghallab, 2017). It often sits between planning and the platform, but unlike planning it is an online process, which must stay reactive to the dynamic of the environment and the platform and cannot devote resources to long computations and complex searches. Acting often relies on models, called skills, which specify how to perform actions (as an operational model), while the action models used for planning are more what is abstractly needed to perform the action (as a descriptive model) (Ghallab et al., 2016). The most basic skills need to connect to the commands made available by the functional level to the acting component, call them asynchronously, get execution status and result, but it also needs means to receive exogenous events as they occur in the environment. This action/command dispatching may also rely on preconditions and invariants checking, interruptions, temporal constraints, etc. Above the basic skills one often finds more complex skills, similar to programs with control structures to allow for local choices and local recoveries with test, branching, looping, parallel and asynchronous execution. Considering the expected functionalities of an acting component, its skill language/framework should provide the following features: Support for Validation and Verification (V&V). Notwithstanding the other functionalities, this is the feature the work presented in this paper focuses on. One cannot only rely on basic skills connecting to the robot commands, one also needs some programming primitives (e.g., test, branching, loop). 1


Consciousness as a logically consistent and prognostic model of reality

Vityaev, Evgenii

arXiv.org Artificial Intelligence

The work demonstrates that brain might reflect the external world causal relationships in the form of a logically consistent and prognostic model of reality, which shows up as consciousness. The paper analyses and solves the problem of statistical ambiguity and provides a formal model of causal relationships as probabilistic maximally specific rules. We suppose that brain makes all possible inferences from causal relationships. We prove that the suggested formal model has a property of an unambiguous inference: from consistent premises we infer a consistent conclusion. It enables a set of all inferences to form a consistent model of the perceived world. Causal relationships may create fixed points of cyclic inter-predictable properties. We consider the "natural" classification introduced by John St. Mill and demonstrate that a variety of fixed points of the objects' attributes forms a "natural" classification of the external world. Then we consider notions of "natural" categories and causal models of categories, introduced by Eleanor Rosch and Bob Rehder and demonstrate that fixed points of causal relationships between objects attributes, which we perceive, formalize these notions. If the "natural" classification describes the objects of the external world, and "natural" concepts the perception of these objects, then the theory of integrated information, introduced by G. Tononi, describes the information processes of the brain for "natural" concepts formation that reflects the "natural" classification. We argue that integrated information provides high accuracy of the objects identification. A computer-based experiment is provided that illustrates fixed points formation for coded digits.


Dynamic Certification for Autonomous Systems

Communications of the ACM

While gridworlds represent rather simplistic modules, they are quite powerful in demonstrating scalable behavior. Simply, an agent that fails to behave safely in such simple environments is also unlikely to behave safely in the real world.26 A parametric MDP can model the composition of these three modules into a single socio-technical system. The UAV can land and take off from anywhere in the region. It will lose connection and land-in-place with probability p1 (opaque UAV in Figure 2) and remain grounded until it reestablishes connection with probability p2.